EXPLORACIÓN DEL DATASET¶

ANÁLISIS BÁSICO¶

In [ ]:
import numpy as np
import pandas as pd
In [ ]:
df = pd.read_csv('../../Data/01CrudoNoEditar/01desastres_crudo.csv', delimiter=';', encoding='latin-1')
df.head(3)
C:\Users\blanc\AppData\Local\Temp\ipykernel_3912\2290156345.py:1: DtypeWarning: Columns (18,24,25,45) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv('../../Data/01CrudoNoEditar/01desastres_crudo.csv', delimiter=';', encoding='latin-1')
Out[ ]:
Dis No Year Seq Glide Disaster Group Disaster Subgroup Disaster Type Disaster Subtype Disaster Subsubtype Event Name ... Reconstruction Costs, Adjusted ('000 US$) Insured Damages ('000 US$) Insured Damages, Adjusted ('000 US$) Total Damages ('000 US$) Total Damages, Adjusted ('000 US$) CPI Adm Level Admin1 Code Admin2 Code Geo Locations
0 1900-9002-CPV 1900 9002 NaN Natural Climatological Drought Drought NaN NaN ... NaN NaN NaN NaN NaN 2,849084409 NaN NaN NaN NaN
1 1900-9001-IND 1900 9001 NaN Natural Climatological Drought Drought NaN NaN ... NaN NaN NaN NaN NaN 2,849084409 NaN NaN NaN NaN
2 1902-0012-GTM 1902 12 NaN Natural Geophysical Earthquake Ground movement NaN NaN ... NaN NaN NaN 25000.0 843726.0 2,963047785 NaN NaN NaN NaN

3 rows × 50 columns

In [ ]:
df.shape
Out[ ]:
(16636, 50)
In [ ]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16636 entries, 0 to 16635
Data columns (total 50 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   Dis No                                     16636 non-null  object 
 1   Year                                       16636 non-null  int64  
 2   Seq                                        16636 non-null  int64  
 3   Glide                                      1736 non-null   object 
 4   Disaster Group                             16636 non-null  object 
 5   Disaster Subgroup                          16636 non-null  object 
 6   Disaster Type                              16636 non-null  object 
 7   Disaster Subtype                           13313 non-null  object 
 8   Disaster Subsubtype                        1117 non-null   object 
 9   Event Name                                 3969 non-null   object 
 10  Country                                    16636 non-null  object 
 11  ISO                                        16636 non-null  object 
 12  Region                                     16636 non-null  object 
 13  Continent                                  16636 non-null  object 
 14  Location                                   14825 non-null  object 
 15  Origin                                     4085 non-null   object 
 16  Associated Dis                             3593 non-null   object 
 17  Associated Dis2                            763 non-null    object 
 18  OFDA Response                              1716 non-null   object 
 19  Appeal                                     2559 non-null   object 
 20  Declaration                                3343 non-null   object 
 21  AID Contribution ('000 US$)                776 non-null    float64
 22  Dis Mag Value                              5064 non-null   float64
 23  Dis Mag Scale                              15416 non-null  object 
 24  Latitude                                   2775 non-null   object 
 25  Longitude                                  2775 non-null   object 
 26  Local Time                                 1156 non-null   object 
 27  River Basin                                1336 non-null   object 
 28  Start Year                                 16636 non-null  int64  
 29  Start Month                                16241 non-null  float64
 30  Start Day                                  13021 non-null  float64
 31  End Year                                   16636 non-null  int64  
 32  End Month                                  15936 non-null  float64
 33  End Day                                    13105 non-null  float64
 34  Total Deaths                               11838 non-null  float64
 35  No Injured                                 4147 non-null   float64
 36  No Affected                                9673 non-null   float64
 37  No Homeless                                2470 non-null   float64
 38  Total Affected                             12143 non-null  float64
 39  Reconstruction Costs ('000 US$)            38 non-null     float64
 40  Reconstruction Costs, Adjusted ('000 US$)  36 non-null     float64
 41  Insured Damages ('000 US$)                 1109 non-null   float64
 42  Insured Damages, Adjusted ('000 US$)       1109 non-null   float64
 43  Total Damages ('000 US$)                   5384 non-null   float64
 44  Total Damages, Adjusted ('000 US$)         5366 non-null   float64
 45  CPI                                        16530 non-null  object 
 46  Adm Level                                  8475 non-null   object 
 47  Admin1 Code                                5030 non-null   object 
 48  Admin2 Code                                4248 non-null   object 
 49  Geo Locations                              8475 non-null   object 
dtypes: float64(17), int64(4), object(29)
memory usage: 6.3+ MB
In [ ]:
from ydata_profiling import ProfileReport

df = pd.read_csv('../../Data/01CrudoNoEditar/01desastres_crudo.csv', delimiter=';', encoding='latin-1')
profile = ProfileReport(df, title='Pandas Profiling Report')
profile
C:\Users\blanc\AppData\Local\Temp\ipykernel_3912\3046761109.py:3: DtypeWarning: Columns (18,24,25,45) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv('../../Data/01CrudoNoEditar/01desastres_crudo.csv', delimiter=';', encoding='latin-1')
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[ ]: